pacman::p_load(tidyverse, sf, readr, tmap, dplyr, knitr, animation, png, magick, openxlsx, readxl, sfdep, ggstatsplot, olsrr, performance, gtsummary, GWmodel)Take-home Exercise 3: Provincial Competitiveness Index influence on FDI in Vietnam
Introduction
Provincial Competitiveness Index in Vietnam
Context: Vietnam’s provinces vary significantly in competitiveness, as captured by the Provincial Competitiveness Index (PCI). This index evaluates key dimensions such as entry costs, land access, transparency, and labor policies, which influence the investment climate and economic potential of each region.
Challenges: Provinces aiming to attract investment face challenges related to regional disparities and governance effectiveness. Understanding PCI dimensions is essential for identifying strengths and areas for improvement.
Analysis Focus
Objectives: This analysis aims to evaluate PCI dimensions through linear regression, examining their correlation with FDI projects and FDI registered capital inflow to identify combinations that drive investment.
Goals:
Identify Key Factors: Determine which PCI dimensions most influence FDI total projects and FDI total registered capital.
Province-Specific Insights: Highlight PCI factors lacking in specific provinces to guide policymaking.
Actionable Recommendations: Provide targeted suggestions for enhancing PCI dimensions to improve the investment climate.
Significance
This project will analyze how PCI dimensions affect Vietnam’s economic landscape, offering actionable insights to help policymakers enhance regional competitiveness and stimulate sustainable development.
1.0 Setup
1.1 Installing R-Packages
sf:For handling spatial vector data and transforming it into simple features (
sf) objects.Functions like
st_read()for importing spatial data andst_transform()for coordinate reference system transformations.
tidyverse: For data manipulation and transformation, including functions for working withtibbledata frames.readr: For reading in CSV or other text-based data files.openxlsx, readxl: For reading or exporting in XLSXdplyr: provide data manipulation capabilities (eg. to group and summarize the relationships between these columns)
knitr, gtsummary: For styling table
tmap: For creating thematic mapsanimation, png, magick: For animation work
sfdep: For performing both local and global spatial autocorrelation analysis
ggstatsplot: to visualize relationships with statistical detailsolsrr: R package for building OLS and performing diagnostics testperformance: to visually compare between models
- GWmodel
1.2 Data Acquisition
We will be using these dataset:
- Source: Vietnam - Subnational Administrative Boundaries at HDX.
- Province Boundaries: To map scores from various analyses onto geographic boundaries, enabling visualization of regional competitiveness patterns across the country
Source: Vietnam Statistics Office , Provincial Competitiveness Index
Provincial Competitiveness Index (PCI): To evaluate the competitive environment of each province, identifying strengths and weaknesses that influence investment potential.
Foreign Direct Investment (FDI): To assess the attractiveness of provinces for foreign investors and identify trends in investment across different sectors.
1.3 Data Preparation and Wrangling
provincial_boundaries <- st_read(dsn = "data/boundaries/provincial", layer="geoBoundaries-VNM-ADM1")
class(provincial_boundaries)
st_crs(provincial_boundaries)
provincial_boundaries <- provincial_boundaries %>%
st_transform(crs = 3405) # Transform coordinate
# Drop & Rename column
provincial_boundaries <- provincial_boundaries %>%
select(shapeName, shapeISO, shapeGroup, geometry) %>%
rename(
province_vn = shapeName,
province_code = shapeISO,
country_code = shapeGroup
)
# Create a new column 'province_en' based on 'province_code'
provincial_boundaries <- provincial_boundaries %>%
mutate(province_en = case_when(
province_code == "VN-44" ~ "An Giang",
province_code == "VN-43" ~ "BRVT",
province_code == "VN-54" ~ "Bac Giang",
province_code == "VN-53" ~ "Bac Kan",
province_code == "VN-55" ~ "Bac Lieu",
province_code == "VN-56" ~ "Bac Ninh",
province_code == "VN-50" ~ "Ben Tre",
province_code == "VN-31" ~ "Binh Dinh",
province_code == "VN-57" ~ "Binh Duong",
province_code == "VN-58" ~ "Binh Phuoc",
province_code == "VN-40" ~ "Binh Thuan",
province_code == "VN-59" ~ "Ca Mau",
province_code == "VN-CT" ~ "Can Tho",
province_code == "VN-04" ~ "Cao Bang",
province_code == "VN-DN" ~ "Da Nang",
province_code == "VN-33" ~ "Dak Lak",
province_code == "VN-72" ~ "Dak Nong",
province_code == "VN-71" ~ "Dien Bien",
province_code == "VN-39" ~ "Dong Nai",
province_code == "VN-45" ~ "Dong Thap",
province_code == "VN-30" ~ "Gia Lai",
province_code == "VN-SG" ~ "HCMC",
province_code == "VN-03" ~ "Ha Giang",
province_code == "VN-63" ~ "Ha Nam",
province_code == "VN-HN" ~ "Ha Noi",
province_code == "VN-23" ~ "Ha Tinh",
province_code == "VN-61" ~ "Hai Duong",
province_code == "VN-HP" ~ "Hai Phong",
province_code == "VN-73" ~ "Hau Giang",
province_code == "VN-14" ~ "Hoa Binh",
province_code == "VN-66" ~ "Hung Yen",
province_code == "VN-34" ~ "Khanh Hoa",
province_code == "VN-47" ~ "Kien Giang",
province_code == "VN-28" ~ "Kon Tum",
province_code == "VN-01" ~ "Lai Chau",
province_code == "VN-35" ~ "Lam Dong",
province_code == "VN-09" ~ "Lang Son",
province_code == "VN-02" ~ "Lao Cai",
province_code == "VN-41" ~ "Long An",
province_code == "VN-67" ~ "Nam Dinh",
province_code == "VN-22" ~ "Nghe An",
province_code == "VN-18" ~ "Ninh Binh",
province_code == "VN-36" ~ "Ninh Thuan",
province_code == "VN-68" ~ "Phu Tho",
province_code == "VN-32" ~ "Phu Yen",
province_code == "VN-24" ~ "Quang Binh",
province_code == "VN-27" ~ "Quang Nam",
province_code == "VN-29" ~ "Quang Ngai",
province_code == "VN-13" ~ "Quang Ninh",
province_code == "VN-25" ~ "Quang Tri",
province_code == "VN-52" ~ "Soc Trang",
province_code == "VN-05" ~ "Son La",
province_code == "VN-26" ~ "TT-Hue",
province_code == "VN-37" ~ "Tay Ninh",
province_code == "VN-20" ~ "Thai Binh",
province_code == "VN-69" ~ "Thai Nguyen",
province_code == "VN-21" ~ "Thanh Hoa",
province_code == "VN-46" ~ "Tien Giang",
province_code == "VN-51" ~ "Tra Vinh",
province_code == "VN-07" ~ "Tuyen Quang",
province_code == "VN-49" ~ "Vinh Long",
province_code == "VN-70" ~ "Vinh Phuc",
province_code == "VN-06" ~ "Yen Bai"
)) %>%
select (province_en, everything())
write_rds(provincial_boundaries, "data/rds/provincial_boundaries.rds")Since Coordinate Reference System of provincial_boundaries
is in 4326 (unit of measurement = degree), we have to transform it
Also, we need to have an english name for each province to allow us to map the province boundary with other dataset
pci_2021 <- read_xlsx("data/provincial_competitiveness_index/2021.xlsx")
pci_2021 <- pci_2021 %>%
mutate(
province_code = case_when(
province_en == "An Giang" ~ "VN-44",
province_en == "BRVT" ~ "VN-43",
province_en == "Bac Giang" ~ "VN-54",
province_en == "Bac Kan" ~ "VN-53",
province_en == "Bac Lieu" ~ "VN-55",
province_en == "Bac Ninh" ~ "VN-56",
province_en == "Ben Tre" ~ "VN-50",
province_en == "Binh Dinh" ~ "VN-31",
province_en == "Binh Duong" ~ "VN-57",
province_en == "Binh Phuoc" ~ "VN-58",
province_en == "Binh Thuan" ~ "VN-40",
province_en == "Ca Mau" ~ "VN-59",
province_en == "Can Tho" ~ "VN-CT",
province_en == "Cao Bang" ~ "VN-04",
province_en == "Da Nang" ~ "VN-DN",
province_en == "Dak Lak" ~ "VN-33",
province_en == "Dak Nong" ~ "VN-72",
province_en == "Dien Bien" ~ "VN-71",
province_en == "Dong Nai" ~ "VN-39",
province_en == "Dong Thap" ~ "VN-45",
province_en == "Gia Lai" ~ "VN-30",
province_en == "HCMC" ~ "VN-SG",
province_en == "Ha Giang" ~ "VN-03",
province_en == "Ha Nam" ~ "VN-63",
province_en == "Ha Noi" ~ "VN-HN",
province_en == "Ha Tinh" ~ "VN-23",
province_en == "Hai Duong" ~ "VN-61",
province_en == "Hai Phong" ~ "VN-HP",
province_en == "Hau Giang" ~ "VN-73",
province_en == "Hoa Binh" ~ "VN-14",
province_en == "Hung Yen" ~ "VN-66",
province_en == "Khanh Hoa" ~ "VN-34",
province_en == "Kien Giang" ~ "VN-47",
province_en == "Kon Tum" ~ "VN-28",
province_en == "Lai Chau" ~ "VN-01",
province_en == "Lam Dong" ~ "VN-35",
province_en == "Lang Son" ~ "VN-09",
province_en == "Lao Cai" ~ "VN-02",
province_en == "Long An" ~ "VN-41",
province_en == "Nam Dinh" ~ "VN-67",
province_en == "Nghe An" ~ "VN-22",
province_en == "Ninh Binh" ~ "VN-18",
province_en == "Ninh Thuan" ~ "VN-36",
province_en == "Phu Tho" ~ "VN-68",
province_en == "Phu Yen" ~ "VN-32",
province_en == "Quang Binh" ~ "VN-24",
province_en == "Quang Nam" ~ "VN-27",
province_en == "Quang Ngai" ~ "VN-29",
province_en == "Quang Ninh" ~ "VN-13",
province_en == "Quang Tri" ~ "VN-25",
province_en == "Soc Trang" ~ "VN-52",
province_en == "Son La" ~ "VN-05",
province_en == "TT-Hue" ~ "VN-26",
province_en == "Tay Ninh" ~ "VN-37",
province_en == "Thai Binh" ~ "VN-20",
province_en == "Thai Nguyen" ~ "VN-69",
province_en == "Thanh Hoa" ~ "VN-21",
province_en == "Tien Giang" ~ "VN-46",
province_en == "Tra Vinh" ~ "VN-51",
province_en == "Tuyen Quang" ~ "VN-07",
province_en == "Vinh Long" ~ "VN-49",
province_en == "Vinh Phuc" ~ "VN-70",
province_en == "Yen Bai" ~ "VN-06",
)
) %>%
select(province_en, province_code, everything())
write.xlsx(pci_2021, "data/rds/pci_2021.xlsx")
fdi <- read_xlsx("data/fdi.xlsx")
# Rename columns
colnames(fdi) <- c("province_en", "total_project_count",
"total_registered_capital")
# Remove the first row
fdi <- fdi[-c(1, 2), ]
fdi <- fdi %>%
mutate(
province_code = case_when(
province_en == "An Giang" ~ "VN-44",
province_en == "Ba Ria - Vung Tau" ~ "VN-43",
province_en == "Bac Giang" ~ "VN-54",
province_en == "Bac Kan" ~ "VN-53",
province_en == "Bac Lieu" ~ "VN-55",
province_en == "Bac Ninh" ~ "VN-56",
province_en == "Ben Tre" ~ "VN-50",
province_en == "Binh Dinh" ~ "VN-31",
province_en == "Binh Duong" ~ "VN-57",
province_en == "Binh Phuoc" ~ "VN-58",
province_en == "Binh Thuan" ~ "VN-40",
province_en == "Ca Mau" ~ "VN-59",
province_en == "Can Tho" ~ "VN-CT",
province_en == "Cao Bang" ~ "VN-04",
province_en == "Da Nang" ~ "VN-DN",
province_en == "Dak Lak" ~ "VN-33",
province_en == "Dak Nong" ~ "VN-72",
province_en == "Dien Bien" ~ "VN-71",
province_en == "Dong Nai" ~ "VN-39",
province_en == "Dong Thap" ~ "VN-45",
province_en == "Gia Lai" ~ "VN-30",
province_en == "Ho Chi Minh city" ~ "VN-SG",
province_en == "Ha Giang" ~ "VN-03",
province_en == "Ha Nam" ~ "VN-63",
province_en == "Ha Noi" ~ "VN-HN",
province_en == "Ha Tinh" ~ "VN-23",
province_en == "Hai Duong" ~ "VN-61",
province_en == "Hai Phong" ~ "VN-HP",
province_en == "Hau Giang" ~ "VN-73",
province_en == "Hoa Binh" ~ "VN-14",
province_en == "Hung Yen" ~ "VN-66",
province_en == "Khanh Hoa" ~ "VN-34",
province_en == "Kien Giang" ~ "VN-47",
province_en == "Kon Tum" ~ "VN-28",
province_en == "Lai Chau" ~ "VN-01",
province_en == "Lam Dong" ~ "VN-35",
province_en == "Lang Son" ~ "VN-09",
province_en == "Lao Cai" ~ "VN-02",
province_en == "Long An" ~ "VN-41",
province_en == "Nam Dinh" ~ "VN-67",
province_en == "Nghe An" ~ "VN-22",
province_en == "Ninh Binh" ~ "VN-18",
province_en == "Ninh Thuan" ~ "VN-36",
province_en == "Phu Tho" ~ "VN-68",
province_en == "Phu Yen" ~ "VN-32",
province_en == "Quang Binh" ~ "VN-24",
province_en == "Quang Nam" ~ "VN-27",
province_en == "Quang Ngai" ~ "VN-29",
province_en == "Quang Ninh" ~ "VN-13",
province_en == "Quang Tri" ~ "VN-25",
province_en == "Soc Trang" ~ "VN-52",
province_en == "Son La" ~ "VN-05",
province_en == "Thua Thien-Hue" ~ "VN-26",
province_en == "Tay Ninh" ~ "VN-37",
province_en == "Thai Binh" ~ "VN-20",
province_en == "Thai Nguyen" ~ "VN-69",
province_en == "Thanh Hoa" ~ "VN-21",
province_en == "Tien Giang" ~ "VN-46",
province_en == "Tra Vinh" ~ "VN-51",
province_en == "Tuyen Quang" ~ "VN-07",
province_en == "Vinh Long" ~ "VN-49",
province_en == "Vinh Phuc" ~ "VN-70",
province_en == "Yen Bai" ~ "VN-06",
)
) %>%
select(province_en, province_code, everything())
fdi <- fdi %>%
left_join(provincial_boundaries, by = "province_code") %>%
select(province_en.x, province_code, total_project_count, total_registered_capital, geometry) %>%
rename(province_en = province_en.x)
write_rds(fdi, "data/rds/fdi.rds")PCI_2021 datasets were inconsistent, so I created a new sheet called ‘summary’ and renamed the old one to ‘summary - old’. The new sheet uses the XLOOKUP function for quick data population from the old sheet, which is much faster compared to handling it in R. In R, different sets of code would be required to manage various data types, making the process more time-consuming.
For economy_pie dataset, we have also performed simple data reformatting shown in ‘Summary’ sheet from ‘Summary -old’ sheet
2.0 Importing the clean set of data
provincial_boundaries <- read_rds("data/rds/provincial_boundaries.rds")
pci_2021 <- read_xlsx("data/rds/pci_2021.xlsx")
fdi <- read_rds("data/rds/fdi.rds")3.0 Prioritization Analysis for Provincial Development: Identifying Key Predictors
3.1 Correlation Matrix
The PCI consists of nine dimensions, each serving as an independent variable with varying degrees of influence on FDI data.
Given that some dimensions may exhibit high correlation with one another, it is essential to identify these correlated pairs and select only one variable from each pair for analysis.
To achieve this, we conduct a correlation matrix to assess the relationships between the dimensions.
ggcorrmat(pci_2021[,4:13])
Interpretation
If any > 0.8 = highly correlated.
We found there isn’t any pair that is highly correlated. We will later reconfirm with the check for [4.6 Checking for multicollinearity].
3.2 Conduct Linear Regression
To explore the influence of each PCI dimension on FDI, we begin with a linear regression model. This initial model will help us determine the relationship between each independent variable (PCI dimensions) and FDI.
By examining the direction and size of each coefficient, we can start to understand the general influence of each dimension. This setup provides a foundation for refining our analysis and identifying key predictors in subsequent steps
pci_2021 <- pci_2021 %>%
left_join(fdi %>%
select(province_code, total_project_count, total_registered_capital),
by = "province_code")
pci_2021$total_registered_capital <- as.numeric(as.character(pci_2021$total_registered_capital))
pci_2021$total_project_count <- as.numeric(as.character(pci_2021$total_project_count))pci_project_mlr <- lm(formula = total_project_count ~ `Entry Costs` +
`Land Access` + Transparency +
`Time Costs` + `Informal charges` + Proactivity +
`Business Support Policy` + `Labor Policy` +
`Law & Order`,
data=pci_2021)
ols_regress(pci_project_mlr) Model Summary
---------------------------------------------------------------------
R 0.645 RMSE 1302.611
R-Squared 0.416 MSE 2011017.608
Adj. R-Squared 0.319 Coef. Var 246.439
Pred R-Squared 0.125 AIC 1121.656
MAE 825.133 SBC 1145.404
---------------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
AIC: Akaike Information Criteria
SBC: Schwarz Bayesian Criteria
ANOVA
------------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
------------------------------------------------------------------------
Regression 77389878.935 9 8598875.437 4.276 3e-04
Residual 108594950.815 54 2011017.608
Total 185984829.750 63
------------------------------------------------------------------------
Parameter Estimates
-------------------------------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
-------------------------------------------------------------------------------------------------------------
(Intercept) 985.260 4334.255 0.227 0.821 -7704.398 9674.917
`Entry Costs` -578.238 362.995 -0.181 -1.593 0.117 -1305.998 149.523
`Land Access` 219.114 451.280 0.061 0.486 0.629 -685.648 1123.877
Transparency -335.325 326.078 -0.123 -1.028 0.308 -989.073 318.422
`Time Costs` 459.403 340.015 0.204 1.351 0.182 -222.286 1141.091
`Informal charges` -453.756 386.112 -0.184 -1.175 0.245 -1227.864 320.351
Proactivity -189.001 393.744 -0.064 -0.480 0.633 -978.409 600.407
`Business Support Policy` 681.895 246.477 0.310 2.767 0.008 187.738 1176.052
`Labor Policy` 846.539 279.910 0.361 3.024 0.004 285.354 1407.724
`Law & Order` -632.826 443.014 -0.210 -1.428 0.159 -1521.015 255.363
-------------------------------------------------------------------------------------------------------------
tbl_regression(pci_project_mlr,
intercept = TRUE) %>%
add_glance_source_note(
label = list(sigma ~ "\U03C3"),
include = c(r.squared, adj.r.squared,
AIC, statistic,
p.value, sigma))| Characteristic | Beta | 95% CI1 | p-value |
|---|---|---|---|
| (Intercept) | 985 | -7,704, 9,675 | 0.8 |
| Entry Costs | -578 | -1,306, 150 | 0.12 |
| Land Access | 219 | -686, 1,124 | 0.6 |
| Transparency | -335 | -989, 318 | 0.3 |
| Time Costs | 459 | -222, 1,141 | 0.2 |
| Informal charges | -454 | -1,228, 320 | 0.2 |
| Proactivity | -189 | -978, 600 | 0.6 |
| Business Support Policy | 682 | 188, 1,176 | 0.008 |
| Labor Policy | 847 | 285, 1,408 | 0.004 |
| Law & Order | -633 | -1,521, 255 | 0.2 |
| R² = 0.416; Adjusted R² = 0.319; AIC = 1,122; Statistic = 4.28; p-value = <0.001; σ = 1,418 | |||
| 1 CI = Confidence Interval | |||
Model Summary
R-Squared of 0.319, indicating that approximately 31.9% of the variation in FDI total number of projects can be accounted for by the independent variables, adjusting for the number of predictors in the model.
ANOVA -Analysis of Variance (F test)
F-ratio of 4.276 -> is significant at p < 0.001. Hence, our regression model is statistically significant, suggesting that at least some fo the PCI dimensions meaningfully contribute to predicting FDI total number of projects
The model summary and ANOVA results reveal that while the overall model has a moderate level of predictive power, with some independent variables (such as Business Support Policy and Labor Policy) showing significant contributions, others (like Entry Costs and Transparency) did not demonstrate strong effects.
pci_capital_mlr <- lm(formula = total_registered_capital ~ `Entry Costs` +
`Land Access` + Transparency +
`Time Costs` + `Informal charges` + Proactivity +
`Business Support Policy` + `Labor Policy` +
`Law & Order`,
data=pci_2021)
ols_regress(pci_capital_mlr) Model Summary
-----------------------------------------------------------------------
R 0.739 RMSE 7912.959
R-Squared 0.546 MSE 74210280.110
Adj. R-Squared 0.470 Coef. Var 117.038
Pred R-Squared 0.373 AIC 1352.585
MAE 6437.884 SBC 1376.333
-----------------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
AIC: Akaike Information Criteria
SBC: Schwarz Bayesian Criteria
ANOVA
---------------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
---------------------------------------------------------------------------
Regression 4821217884.446 9 535690876.050 7.219 0.0000
Residual 4007355125.955 54 74210280.110
Total 8828573010.401 63
---------------------------------------------------------------------------
Parameter Estimates
-----------------------------------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
-----------------------------------------------------------------------------------------------------------------
(Intercept) -32220.037 26329.252 -1.224 0.226 -85007.009 20566.936
`Entry Costs` -3738.724 2205.079 -0.170 -1.696 0.096 -8159.642 682.194
`Land Access` 2393.728 2741.389 0.097 0.873 0.386 -3102.425 7889.882
Transparency 252.004 1980.823 0.013 0.127 0.899 -3719.308 4223.316
`Time Costs` 5067.961 2065.484 0.326 2.454 0.017 926.914 9209.008
`Informal charges` -3828.241 2345.509 -0.226 -1.632 0.108 -8530.704 874.222
Proactivity -2310.748 2391.870 -0.113 -0.966 0.338 -7106.159 2484.663
`Business Support Policy` 5247.362 1497.272 0.347 3.505 0.001 2245.512 8249.211
`Labor Policy` 6748.025 1700.364 0.418 3.969 0.000 3339.000 10157.050
`Law & Order` -3233.972 2691.170 -0.155 -1.202 0.235 -8629.443 2161.499
-----------------------------------------------------------------------------------------------------------------
tbl_regression(pci_capital_mlr,
intercept = TRUE) %>%
add_glance_source_note(
label = list(sigma ~ "\U03C3"),
include = c(r.squared, adj.r.squared,
AIC, statistic,
p.value, sigma))| Characteristic | Beta | 95% CI1 | p-value |
|---|---|---|---|
| (Intercept) | -32,220 | -85,007, 20,567 | 0.2 |
| Entry Costs | -3,739 | -8,160, 682 | 0.10 |
| Land Access | 2,394 | -3,102, 7,890 | 0.4 |
| Transparency | 252 | -3,719, 4,223 | 0.9 |
| Time Costs | 5,068 | 927, 9,209 | 0.017 |
| Informal charges | -3,828 | -8,531, 874 | 0.11 |
| Proactivity | -2,311 | -7,106, 2,485 | 0.3 |
| Business Support Policy | 5,247 | 2,246, 8,249 | <0.001 |
| Labor Policy | 6,748 | 3,339, 10,157 | <0.001 |
| Law & Order | -3,234 | -8,629, 2,161 | 0.2 |
| R² = 0.546; Adjusted R² = 0.470; AIC = 1,353; Statistic = 7.22; p-value = <0.001; σ = 8,615 | |||
| 1 CI = Confidence Interval | |||
Model Summary
Adjusted R-Squared of 0.470, indicating that approximately 47.0% of the variation in FDI total registered capital can be accounted for by the independent variables, adjusting for the number of predictors in the model.
ANOVA - Analysis of Variance (F test)
F-ratio of 7.219 -> is significant at p < 0.001. Hence, our regression model is statistically significant, suggesting that at least some of the PCI dimensions meaningfully contribute to predicting FDI total registered capital.
This model summary and ANOVA indicate that the model has strong predictive power for explaining FDI based on the given PCI dimensions. The significance of specific predictors, such as Business Support Policy, Labor Policy, and Time Costs, suggests that these are influential variables in explaining FDI total registered capital.
For the total number of FDI projects, the model has an Adjusted R-Squared of 0.319, indicating that approximately 31.9% of the variation can be explained by the independent variables. The ANOVA results show a significant F-ratio of 4.276 (p < 0.001), suggesting that some PCI dimensions significantly contribute to the model. However, not all predictors, such as Entry Costs and Transparency, had strong effects.
In contrast, the model for total registered capital has a higher Adjusted R-Squared of 0.470, indicating that 47.0% of the variation is accounted for by the predictors. The ANOVA shows a significant F-ratio of 7.219 (p < 0.001), confirming the model’s strength. Key variables like Business Support Policy, Labor Policy, and Time Costs are particularly influential in explaining FDI total registered capital.
3.3 Run model to Select Independent variable
I will now run different stepwise regression models to further investigate the specific positive and negative impacts of the independent variables on FDI.
This analysis will allow us to quantify how much a 1-unit increase in each independent variable is expected to influence FDI, providing clearer insights into their contributions to both the total number of projects and total registered capital.
All of the model will be making use of the base model formulated from [4.2 Conduct Linear Regression]
pci_project_fw_mlr <- ols_step_forward_p(
pci_project_mlr, # this is the model
p_val = 0.05,
details = FALSE)
pci_project_fw_mlr
Stepwise Summary
------------------------------------------------------------------------------------------
Step Variable AIC SBC SBIC R2 Adj. R2
------------------------------------------------------------------------------------------
0 Base Model 1138.091 1142.409 955.661 0.00000 0.00000
1 `Business Support Policy` 1127.319 1133.796 945.026 0.18091 0.16770
2 `Labor Policy` 1121.525 1130.161 939.623 0.27482 0.25105
3 `Law & Order` 1117.240 1128.035 936.032 0.34265 0.30979
------------------------------------------------------------------------------------------
Final Model Output
------------------
Model Summary
---------------------------------------------------------------------
R 0.585 RMSE 1382.121
R-Squared 0.343 MSE 2037609.764
Adj. R-Squared 0.310 Coef. Var 248.063
Pred R-Squared 0.162 AIC 1117.240
MAE 837.721 SBC 1128.035
---------------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
AIC: Akaike Information Criteria
SBC: Schwarz Bayesian Criteria
ANOVA
--------------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
--------------------------------------------------------------------------
Regression 63728243.939 3 21242747.980 10.425 0.0000
Residual 122256585.811 60 2037609.764
Total 185984829.750 63
--------------------------------------------------------------------------
Parameter Estimates
--------------------------------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
--------------------------------------------------------------------------------------------------------------
(Intercept) -3908.420 2946.943 -1.326 0.190 -9803.183 1986.343
`Business Support Policy` 752.745 234.838 0.343 3.205 0.002 282.998 1222.492
`Labor Policy` 883.432 255.821 0.377 3.453 0.001 371.713 1395.151
`Law & Order` -815.756 327.846 -0.270 -2.488 0.016 -1471.546 -159.967
--------------------------------------------------------------------------------------------------------------
plot(pci_project_fw_mlr)
# fig-width: 12
# fig-height: 10
pci_project_bw_mlr <- ols_step_backward_p(
pci_project_mlr, # this is the model
p_val = 0.05,
details = FALSE)
pci_project_bw_mlr
Stepwise Summary
-----------------------------------------------------------------------------------
Step Variable AIC SBC SBIC R2 Adj. R2
-----------------------------------------------------------------------------------
0 Full Model 1121.656 1145.404 943.667 0.41611 0.31879
1 Proactivity 1119.929 1141.518 941.482 0.41362 0.32833
2 `Land Access` 1118.159 1137.589 939.287 0.41151 0.33794
3 `Informal charges` 1117.405 1134.676 937.879 0.39994 0.33677
4 `Time Costs` 1116.714 1131.826 936.614 0.38754 0.33474
5 Transparency 1116.758 1129.712 936.060 0.36766 0.32478
6 `Entry Costs` 1117.240 1128.035 936.032 0.34265 0.30979
-----------------------------------------------------------------------------------
Final Model Output
------------------
Model Summary
---------------------------------------------------------------------
R 0.585 RMSE 1382.121
R-Squared 0.343 MSE 2037609.764
Adj. R-Squared 0.310 Coef. Var 248.063
Pred R-Squared 0.162 AIC 1117.240
MAE 837.721 SBC 1128.035
---------------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
AIC: Akaike Information Criteria
SBC: Schwarz Bayesian Criteria
ANOVA
--------------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
--------------------------------------------------------------------------
Regression 63728243.939 3 21242747.980 10.425 0.0000
Residual 122256585.811 60 2037609.764
Total 185984829.750 63
--------------------------------------------------------------------------
Parameter Estimates
--------------------------------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
--------------------------------------------------------------------------------------------------------------
(Intercept) -3908.420 2946.943 -1.326 0.190 -9803.183 1986.343
`Business Support Policy` 752.745 234.838 0.343 3.205 0.002 282.998 1222.492
`Labor Policy` 883.432 255.821 0.377 3.453 0.001 371.713 1395.151
`Law & Order` -815.756 327.846 -0.270 -2.488 0.016 -1471.546 -159.967
--------------------------------------------------------------------------------------------------------------
plot(pci_project_bw_mlr)
# fig-width: 12
# fig-height: 10
pci_project_sb_mlr <- ols_step_both_p(
pci_project_mlr, # this is the model
p_val = 0.05,
details = FALSE)
pci_project_sb_mlr
Stepwise Summary
----------------------------------------------------------------------------------------------
Step Variable AIC SBC SBIC R2 Adj. R2
----------------------------------------------------------------------------------------------
0 Base Model 1138.091 1142.409 955.661 0.00000 0.00000
1 `Business Support Policy` (+) 1127.319 1133.796 945.026 0.18091 0.16770
2 `Labor Policy` (+) 1121.525 1130.161 939.623 0.27482 0.25105
3 `Law & Order` (+) 1117.240 1128.035 936.032 0.34265 0.30979
----------------------------------------------------------------------------------------------
Final Model Output
------------------
Model Summary
---------------------------------------------------------------------
R 0.585 RMSE 1382.121
R-Squared 0.343 MSE 2037609.764
Adj. R-Squared 0.310 Coef. Var 248.063
Pred R-Squared 0.162 AIC 1117.240
MAE 837.721 SBC 1128.035
---------------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
AIC: Akaike Information Criteria
SBC: Schwarz Bayesian Criteria
ANOVA
--------------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
--------------------------------------------------------------------------
Regression 63728243.939 3 21242747.980 10.425 0.0000
Residual 122256585.811 60 2037609.764
Total 185984829.750 63
--------------------------------------------------------------------------
Parameter Estimates
--------------------------------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
--------------------------------------------------------------------------------------------------------------
(Intercept) -3908.420 2946.943 -1.326 0.190 -9803.183 1986.343
`Business Support Policy` 752.745 234.838 0.343 3.205 0.002 282.998 1222.492
`Labor Policy` 883.432 255.821 0.377 3.453 0.001 371.713 1395.151
`Law & Order` -815.756 327.846 -0.270 -2.488 0.016 -1471.546 -159.967
--------------------------------------------------------------------------------------------------------------
plot(pci_project_sb_mlr)
pci_capital_fw_mlr <- ols_step_forward_p(
pci_capital_mlr, # this is the model
p_val = 0.05,
details = FALSE)
pci_capital_fw_mlr
Stepwise Summary
-------------------------------------------------------------------------------------------
Step Variable AIC SBC SBIC R2 Adj. R2
-------------------------------------------------------------------------------------------
0 Base Model 1385.136 1389.454 1202.161 0.00000 0.00000
1 `Business Support Policy` 1367.982 1374.459 1185.110 0.25865 0.24669
2 `Labor Policy` 1355.344 1363.979 1173.177 0.41022 0.39089
3 `Law & Order` 1351.950 1362.744 1170.264 0.45789 0.43078
-------------------------------------------------------------------------------------------
Final Model Output
------------------
Model Summary
-----------------------------------------------------------------------
R 0.677 RMSE 8647.671
R-Squared 0.458 MSE 79767687.852
Adj. R-Squared 0.431 Coef. Var 121.341
Pred R-Squared 0.371 AIC 1351.950
MAE 6737.299 SBC 1362.744
-----------------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
AIC: Akaike Information Criteria
SBC: Schwarz Bayesian Criteria
ANOVA
-----------------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
-----------------------------------------------------------------------------
Regression 4042511739.270 3 1347503913.090 16.893 0.0000
Residual 4786061271.131 60 79767687.852
Total 8828573010.401 63
-----------------------------------------------------------------------------
Parameter Estimates
-----------------------------------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
-----------------------------------------------------------------------------------------------------------------
(Intercept) -44762.474 18438.462 -2.428 0.018 -81644.889 -7880.059
`Business Support Policy` 6351.523 1469.340 0.420 4.323 0.000 3412.406 9290.640
`Labor Policy` 7262.798 1600.626 0.450 4.537 0.000 4061.069 10464.526
`Law & Order` -4711.518 2051.268 -0.227 -2.297 0.025 -8814.666 -608.370
-----------------------------------------------------------------------------------------------------------------
plot(pci_capital_fw_mlr)
# fig-width: 12
# fig-height: 10
pci_capital_bw_mlr <- ols_step_backward_p(
pci_capital_mlr, # this is the model
p_val = 0.05,
details = FALSE)
pci_capital_bw_mlr
Stepwise Summary
------------------------------------------------------------------------------------
Step Variable AIC SBC SBIC R2 Adj. R2
------------------------------------------------------------------------------------
0 Full Model 1352.585 1376.333 1174.596 0.54609 0.47044
1 Transparency 1350.604 1372.193 1172.239 0.54596 0.47991
2 `Land Access` 1349.491 1368.921 1170.506 0.53962 0.48208
3 Proactivity 1348.523 1365.794 1168.952 0.53214 0.48289
4 `Informal charges` 1348.492 1363.604 1168.222 0.51752 0.47593
5 `Entry Costs` 1350.489 1363.443 1169.336 0.48642 0.45161
6 `Time Costs` 1351.950 1362.744 1170.264 0.45789 0.43078
------------------------------------------------------------------------------------
Final Model Output
------------------
Model Summary
-----------------------------------------------------------------------
R 0.677 RMSE 8647.671
R-Squared 0.458 MSE 79767687.852
Adj. R-Squared 0.431 Coef. Var 121.341
Pred R-Squared 0.371 AIC 1351.950
MAE 6737.299 SBC 1362.744
-----------------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
AIC: Akaike Information Criteria
SBC: Schwarz Bayesian Criteria
ANOVA
-----------------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
-----------------------------------------------------------------------------
Regression 4042511739.270 3 1347503913.090 16.893 0.0000
Residual 4786061271.131 60 79767687.852
Total 8828573010.401 63
-----------------------------------------------------------------------------
Parameter Estimates
-----------------------------------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
-----------------------------------------------------------------------------------------------------------------
(Intercept) -44762.474 18438.462 -2.428 0.018 -81644.889 -7880.059
`Business Support Policy` 6351.523 1469.340 0.420 4.323 0.000 3412.406 9290.640
`Labor Policy` 7262.798 1600.626 0.450 4.537 0.000 4061.069 10464.526
`Law & Order` -4711.518 2051.268 -0.227 -2.297 0.025 -8814.666 -608.370
-----------------------------------------------------------------------------------------------------------------
plot(pci_capital_bw_mlr)
# fig-width: 12
# fig-height: 10
pci_capital_sb_mlr <- ols_step_both_p(
pci_capital_mlr, # this is the model
p_val = 0.05,
details = FALSE)
pci_capital_sb_mlr
Stepwise Summary
-----------------------------------------------------------------------------------------------
Step Variable AIC SBC SBIC R2 Adj. R2
-----------------------------------------------------------------------------------------------
0 Base Model 1385.136 1389.454 1202.161 0.00000 0.00000
1 `Business Support Policy` (+) 1367.982 1374.459 1185.110 0.25865 0.24669
2 `Labor Policy` (+) 1355.344 1363.979 1173.177 0.41022 0.39089
3 `Law & Order` (+) 1351.950 1362.744 1170.264 0.45789 0.43078
4 `Time Costs` (+) 1350.489 1363.443 1169.336 0.48642 0.45161
5 `Entry Costs` (+) 1348.492 1363.604 1168.222 0.51752 0.47593
-----------------------------------------------------------------------------------------------
Final Model Output
------------------
Model Summary
-----------------------------------------------------------------------
R 0.719 RMSE 8158.221
R-Squared 0.518 MSE 73441733.084
Adj. R-Squared 0.476 Coef. Var 116.430
Pred R-Squared 0.414 AIC 1348.492
MAE 6522.401 SBC 1363.604
-----------------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
AIC: Akaike Information Criteria
SBC: Schwarz Bayesian Criteria
ANOVA
----------------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
----------------------------------------------------------------------------
Regression 4568952491.502 5 913790498.300 12.442 0.0000
Residual 4259620518.899 58 73441733.084
Total 8828573010.401 63
----------------------------------------------------------------------------
Parameter Estimates
----------------------------------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
----------------------------------------------------------------------------------------------------------------
(Intercept) -30495.886 19965.179 -1.527 0.132 -70460.535 9468.762
`Business Support Policy` 5572.081 1467.377 0.368 3.797 0.000 2634.807 8509.355
`Labor Policy` 6085.465 1630.387 0.377 3.733 0.000 2821.892 9349.039
`Law & Order` -4842.805 2152.763 -0.233 -2.250 0.028 -9152.029 -533.581
`Time Costs` 3741.369 1713.689 0.241 2.183 0.033 311.048 7171.689
`Entry Costs` -4197.282 2170.954 -0.191 -1.933 0.058 -8542.918 148.354
----------------------------------------------------------------------------------------------------------------
plot(pci_capital_sb_mlr)
3.4 Model Selection
Next, we will utilize a radar chart to visualize the performance of the different models.
The model that has the most edges touching the outer boundary is considered the best performer, indicating stronger overall results across the evaluated metrics.
project_metric <- compare_performance(pci_project_mlr,
pci_project_fw_mlr$model,
pci_project_bw_mlr$model,
pci_project_sb_mlr$model)Some of the nested models seem to be identical
project_metric$Name <- gsub(".*\\\\([a-zA-Z0-9_]+)\\\\, \\\\model\\\\.*", "\\1", project_metric$Name)
# plot radar
plot(project_metric)
capital_metric <- compare_performance(pci_capital_mlr,
pci_capital_fw_mlr$model,
pci_capital_bw_mlr$model,
pci_capital_sb_mlr$model)
capital_metric$Name <- gsub(".*\\\\([a-zA-Z0-9_]+)\\\\, \\\\model\\\\.*", "\\1", capital_metric$Name)
# plot radar
plot(capital_metric)
For predicting the total number of projects, the best-performing model is pci_project_sb_mlr).
In contrast, for predicting total registered capital, the best-performing model is pci_capital_sb_mlr.
3.5 Visualize model parameters
We will now utilize the best-performing model to quantify the exact positive or negative impact (in numerical terms) that a one-unit change in the independent variables will have.
ggcoefstats(pci_project_sb_mlr$model,
sort = "ascending")
ggcoefstats(pci_capital_sb_mlr$model,
sort = "ascending")
To enhance the attraction of Foreign Direct Investment (FDI) projects and registered capital, policymakers should prioritize improvements in Labor Policy and Business Support Policy.
For every single unit increase in Labor Policy, there is a positive influence on attracting more FDI projects and increasing registered capital.
Similarly, an increase in Business Support Policy also contributes positively to both the total number of FDI projects and the registered capital.
Focusing on these two policy areas will significantly bolster efforts to attract more FDI.
3.6 Checking for multicollinearity
We will now confirm our Correlation Matrix by looking at the Variance Inflation Factor (VIF)
Interpretation
< 5: low multicollinearity
5-10: moderate multcollinearity
>10: strong multicollinearity
check_collinearity(pci_project_sb_mlr$model)# Check for Multicollinearity
Low Correlation
Term VIF VIF 95% CI Increased SE Tolerance
Business Support Policy 1.04 [1.00, 12.89] 1.02 0.96
Labor Policy 1.09 [1.00, 2.65] 1.04 0.92
Law & Order 1.08 [1.00, 3.13] 1.04 0.93
Tolerance 95% CI
[0.08, 1.00]
[0.38, 1.00]
[0.32, 1.00]
plot(check_collinearity(pci_project_sb_mlr$model)) +
theme(axis.text.x = element_text(
angle = 45,
hjust = 1
))Variable `Component` is not in your data frame :/

check_collinearity(pci_capital_sb_mlr$model)# Check for Multicollinearity
Low Correlation
Term VIF VIF 95% CI Increased SE Tolerance
Business Support Policy 1.13 [1.02, 2.03] 1.06 0.89
Labor Policy 1.23 [1.06, 1.89] 1.11 0.81
Law & Order 1.29 [1.09, 1.93] 1.13 0.78
Time Costs 1.46 [1.19, 2.13] 1.21 0.68
Entry Costs 1.17 [1.03, 1.91] 1.08 0.85
Tolerance 95% CI
[0.49, 0.98]
[0.53, 0.95]
[0.52, 0.92]
[0.47, 0.84]
[0.52, 0.97]
plot(check_collinearity(pci_capital_sb_mlr$model)) +
theme(axis.text.x = element_text(
angle = 45,
hjust = 1
))Variable `Component` is not in your data frame :/

There is no Multicollinearity found in both the model used for Total Number of projects and Total Registered Capital
3.7 Linearity Assumption Test
project_out <- plot(check_model(pci_project_sb_mlr$model,
panel = FALSE))For confidence bands, please install `qqplotr`.
project_out[[2]]
capital_out <- plot(check_model(pci_capital_sb_mlr$model,
panel = FALSE))For confidence bands, please install `qqplotr`.
capital_out[[2]]
3.8 Normality Assumption Test
plot(check_normality(pci_project_sb_mlr$model))For confidence bands, please install `qqplotr`.

plot(check_normality(pci_capital_sb_mlr$model))For confidence bands, please install `qqplotr`.

3.9 Checking of outliers
project_outliers <- check_outliers(pci_project_sb_mlr$model,
method = "cook")
project_outliers1 outlier detected: case 30.
- Based on the following method and threshold: cook (0.849).
- For variable: (Whole model).
plot(project_outliers <- check_outliers(pci_project_sb_mlr$model,
method = "cook"))
capital_outliers <- check_outliers(pci_capital_sb_mlr$model,
method = "cook")
capital_outliersOK: No outliers detected.
- Based on the following method and threshold: cook (0.902).
- For variable: (Whole model)
plot(capital_outliers <- check_outliers(pci_capital_sb_mlr$model,
method = "cook"))
After conducting the tests, I can conclude that both the models used for the Total Number of Projects and Total Registered Capital meet the necessary assumptions and successfully pass the tests.
4.0 Spatial Non-Stationary Assumption
project_mlr_output <- as.data.frame(pci_project_sb_mlr$model$residuals) %>%
rename(`SB_MLR_RES` = `pci_project_sb_mlr$model$residuals`)
# join the newly created data frame
project_fdi_sf <- cbind(provincial_boundaries,
project_mlr_output$SB_MLR_RES) %>%
rename(`MLR_RES` = `project_mlr_output.SB_MLR_RES`)
tmap_mode("view")tmap mode set to interactive viewing
tm_shape(provincial_boundaries)+
tmap_options(check.and.fix = TRUE) +
tm_polygons(alpha = 0.4) +
tm_shape(project_fdi_sf) +
tm_polygons(col = "MLR_RES",
alpha = 0.6,
size = 0.3,
style="quantile") Warning: The shape provincial_boundaries is invalid (after reprojection). See
sf::st_is_valid
Warning: The shape project_fdi_sf is invalid (after reprojection). See
sf::st_is_valid
Variable(s) "MLR_RES" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
tmap_mode("plot")tmap mode set to plotting
# compute the distance-based weight matrix by using dnearneigh() function of spdep.
project_fdi_sf <- project_fdi_sf %>%
mutate(nb = st_knn(geometry, k=6,
longlat = FALSE),
wt = st_weights(nb,
style = "W"),
.before = 1)! Polygon provided. Using point on surface.
# global moran_perm
global_moran_perm(project_fdi_sf$MLR_RES,
project_fdi_sf$nb,
project_fdi_sf$wt,
alternative = "two.sided",
nsim = 999)
Monte-Carlo simulation of Moran I
data: x
weights: listw
number of simulations + 1: 1000
statistic = -0.022142, observed rank = 470, p-value = 0.94
alternative hypothesis: two.sided
capital_mlr_output <- as.data.frame(pci_capital_sb_mlr$model$residuals) %>%
rename(`SB_MLR_RES` = `pci_capital_sb_mlr$model$residuals`)
# join the newly created data frame
capital_fdi_sf <- cbind(provincial_boundaries,
capital_mlr_output$SB_MLR_RES) %>%
rename(`MLR_RES` = `capital_mlr_output.SB_MLR_RES`)
tmap_mode("view")tmap mode set to interactive viewing
tm_shape(provincial_boundaries)+
tmap_options(check.and.fix = TRUE) +
tm_polygons(alpha = 0.4) +
tm_shape(capital_fdi_sf) +
tm_polygons(col = "MLR_RES",
alpha = 0.6,
size = 0.3,
style="quantile") Warning: The shape provincial_boundaries is invalid (after reprojection). See
sf::st_is_valid
Warning: The shape capital_fdi_sf is invalid (after reprojection). See
sf::st_is_valid
Variable(s) "MLR_RES" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
tmap_mode("plot")tmap mode set to plotting
# compute the distance-based weight matrix by using dnearneigh() function of spdep.
capital_fdi_sf <- capital_fdi_sf %>%
mutate(nb = st_knn(geometry, k=6,
longlat = FALSE),
wt = st_weights(nb,
style = "W"),
.before = 1)! Polygon provided. Using point on surface.
# global moran_perm
global_moran_perm(capital_fdi_sf$MLR_RES,
capital_fdi_sf$nb,
capital_fdi_sf$wt,
alternative = "two.sided",
nsim = 999)
Monte-Carlo simulation of Moran I
data: x
weights: listw
number of simulations + 1: 1000
statistic = -0.016383, observed rank = 556, p-value = 0.888
alternative hypothesis: two.sided
Based on the results of the Moran’s I tests, I can conclude that there is no evidence of significant spatial autocorrelation in the data for either model. This suggests that the distribution of variable does not show systematic clustering or dispersion across the studied area.
5.0 Local
Preparing the data
pci_2021 <- pci_2021 %>%
left_join(provincial_boundaries %>%
select(province_code, geometry),
by = "province_code") %>%
st_as_sf()Warning in left_join(., provincial_boundaries %>% select(province_code, : Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 11 of `x` matches multiple rows in `y`.
ℹ Row 2 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
"many-to-many"` to silence this warning.
Fixed VS Adaptive Bandwidth
bw.fixed_project <- bw.gwr(formula = total_project_count ~
`Entry Costs` + `Land Access` + Transparency +
`Time Costs` + `Informal charges` + Proactivity +
`Business Support Policy` + `Labor Policy` +
`Law & Order`,
data=pci_2021,
approach="CV",
kernel="boxcar",
adaptive=FALSE,
longlat=FALSE)Fixed bandwidth: 968784.2 CV score: 222394245
Fixed bandwidth: 598861.3 CV score: 253731784
Fixed bandwidth: 1197409 CV score: 166709145
Fixed bandwidth: 1338707 CV score: 171205597
Fixed bandwidth: 1110082 CV score: 208175729
Fixed bandwidth: 1251380 CV score: 171769613
Fixed bandwidth: 1164053 CV score: 161667008
Fixed bandwidth: 1143438 CV score: 162515312
Fixed bandwidth: 1176794 CV score: 165824047
Fixed bandwidth: 1156179 CV score: 161322847
Fixed bandwidth: 1151312 CV score: 161298848
Fixed bandwidth: 1148305 CV score: 160623353
Fixed bandwidth: 1146446 CV score: 163009911
Fixed bandwidth: 1149453 CV score: 160891544
Fixed bandwidth: 1147595 CV score: 160623353
gwr.fixed_project <- gwr.basic(formula = total_project_count ~
`Entry Costs` + `Land Access` + Transparency +
`Time Costs` + `Informal charges` + Proactivity +
`Business Support Policy` + `Labor Policy` +
`Law & Order`,
data=pci_2021,
bw=bw.fixed_project,
kernel = 'boxcar',
longlat = FALSE)
gwr.fixed_project ***********************************************************************
* Package GWmodel *
***********************************************************************
Program starts at: 2024-11-11 01:20:13.434497
Call:
gwr.basic(formula = total_project_count ~ `Entry Costs` + `Land Access` +
Transparency + `Time Costs` + `Informal charges` + Proactivity +
`Business Support Policy` + `Labor Policy` + `Law & Order`,
data = pci_2021, bw = bw.fixed_project, kernel = "boxcar",
longlat = FALSE)
Dependent (y) variable: total_project_count
Independent variables: Entry Costs Land Access Transparency Time Costs Informal charges Proactivity Business Support Policy Labor Policy Law & Order
Number of data points: 66
***********************************************************************
* Results of Global Regression *
***********************************************************************
Call:
lm(formula = formula, data = data)
Residuals:
Min 1Q Median 3Q Max
-1500.4 -720.6 -243.1 295.9 7609.3
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1303.9 4351.0 0.300 0.76553
`Entry Costs` -499.2 361.4 -1.381 0.17264
`Land Access` 250.5 453.1 0.553 0.58261
Transparency -360.6 327.3 -1.102 0.27531
`Time Costs` 411.7 340.4 1.210 0.23152
`Informal charges` -454.6 388.0 -1.172 0.24628
Proactivity -142.3 394.6 -0.361 0.71970
`Business Support Policy` 611.7 243.7 2.510 0.01499 *
`Labor Policy` 810.5 280.4 2.891 0.00546 **
`Law & Order` -668.3 444.6 -1.503 0.13846
---Significance stars
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1425 on 56 degrees of freedom
Multiple R-squared: 0.3885
Adjusted R-squared: 0.2902
F-statistic: 3.953 on 9 and 56 DF, p-value: 0.0006067
***Extra Diagnostic information
Residual sum of squares: 113728132
Sigma(hat): 1333.042
AIC: 1157.038
AICc: 1161.927
BIC: 1161.21
***********************************************************************
* Results of Geographically Weighted Regression *
***********************************************************************
*********************Model calibration information*********************
Kernel function: boxcar
Fixed bandwidth: 1147595
Regression points: the same locations as observations are used.
Distance metric: Euclidean distance metric is used.
****************Summary of GWR coefficient estimates:******************
Min. 1st Qu. Median 3rd Qu. Max.
Intercept -2029.95 -637.88 596.48 1123.48 1882.8340
`Entry Costs` -749.00 -545.78 -493.11 -414.02 -281.8974
`Land Access` -201.59 218.75 264.32 435.01 548.4379
Transparency -658.25 -383.53 -360.58 -317.80 8.2392
`Time Costs` 333.00 407.81 412.38 445.83 514.3275
`Informal charges` -671.17 -532.00 -452.95 -323.95 1.2336
Proactivity -289.04 -175.30 -142.30 -21.09 89.7491
`Business Support Policy` 310.81 608.78 687.04 719.28 827.6244
`Labor Policy` 589.33 765.44 834.32 923.34 1060.9729
`Law & Order` -1003.16 -724.69 -674.45 -636.49 -363.5316
************************Diagnostic information*************************
Number of data points: 66
Effective number of parameters (2trace(S) - trace(S'S)): 13.16825
Effective degrees of freedom (n-2trace(S) + trace(S'S)): 52.83175
AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 1163.596
AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 1139.972
BIC (GWR book, Fotheringham, et al. 2002,GWR p. 61, eq. 2.34): 1115.975
Residual sum of squares: 100389487
R-square value: 0.4602377
Adjusted R-square value: 0.3231069
***********************************************************************
Program stops at: 2024-11-11 01:20:13.460021
bw.adaptive_project <- bw.gwr(formula = total_project_count ~
`Entry Costs` + `Land Access` + Transparency +
`Time Costs` + `Informal charges` + Proactivity +
`Business Support Policy` + `Labor Policy` +
`Law & Order`,
data=pci_2021,
approach="CV",
kernel="boxcar",
adaptive=TRUE,
longlat=FALSE)Adaptive bandwidth: 48 CV score: 204424128
Adaptive bandwidth: 38 CV score: 235355686
Adaptive bandwidth: 56 CV score: 181731389
Adaptive bandwidth: 59 CV score: 174435740
Adaptive bandwidth: 63 CV score: 168499922
Adaptive bandwidth: 63 CV score: 168499922
gwr.adaptive_project <- gwr.basic(formula = total_project_count ~
`Entry Costs` + `Land Access` + Transparency +
`Time Costs` + `Informal charges` + Proactivity +
`Business Support Policy` + `Labor Policy` +
`Law & Order`,
data=pci_2021,
bw=bw.adaptive_project,
kernel = 'boxcar',
adaptive=TRUE,
longlat = FALSE)
gwr.adaptive_project ***********************************************************************
* Package GWmodel *
***********************************************************************
Program starts at: 2024-11-11 01:20:13.508375
Call:
gwr.basic(formula = total_project_count ~ `Entry Costs` + `Land Access` +
Transparency + `Time Costs` + `Informal charges` + Proactivity +
`Business Support Policy` + `Labor Policy` + `Law & Order`,
data = pci_2021, bw = bw.adaptive_project, kernel = "boxcar",
adaptive = TRUE, longlat = FALSE)
Dependent (y) variable: total_project_count
Independent variables: Entry Costs Land Access Transparency Time Costs Informal charges Proactivity Business Support Policy Labor Policy Law & Order
Number of data points: 66
***********************************************************************
* Results of Global Regression *
***********************************************************************
Call:
lm(formula = formula, data = data)
Residuals:
Min 1Q Median 3Q Max
-1500.4 -720.6 -243.1 295.9 7609.3
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1303.9 4351.0 0.300 0.76553
`Entry Costs` -499.2 361.4 -1.381 0.17264
`Land Access` 250.5 453.1 0.553 0.58261
Transparency -360.6 327.3 -1.102 0.27531
`Time Costs` 411.7 340.4 1.210 0.23152
`Informal charges` -454.6 388.0 -1.172 0.24628
Proactivity -142.3 394.6 -0.361 0.71970
`Business Support Policy` 611.7 243.7 2.510 0.01499 *
`Labor Policy` 810.5 280.4 2.891 0.00546 **
`Law & Order` -668.3 444.6 -1.503 0.13846
---Significance stars
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1425 on 56 degrees of freedom
Multiple R-squared: 0.3885
Adjusted R-squared: 0.2902
F-statistic: 3.953 on 9 and 56 DF, p-value: 0.0006067
***Extra Diagnostic information
Residual sum of squares: 113728132
Sigma(hat): 1333.042
AIC: 1157.038
AICc: 1161.927
BIC: 1161.21
***********************************************************************
* Results of Geographically Weighted Regression *
***********************************************************************
*********************Model calibration information*********************
Kernel function: boxcar
Adaptive bandwidth: 63 (number of nearest neighbours)
Regression points: the same locations as observations are used.
Distance metric: Euclidean distance metric is used.
****************Summary of GWR coefficient estimates:******************
Min. 1st Qu. Median 3rd Qu. Max.
Intercept 214.29 214.29 971.88 971.88 1420.200
`Entry Costs` -575.15 -575.15 -561.30 -561.30 -463.273
`Land Access` 221.39 221.39 289.30 353.31 353.307
Transparency -433.66 -353.89 -353.89 -333.83 -333.826
`Time Costs` 359.70 414.10 414.10 468.73 468.725
`Informal charges` -487.83 -456.36 -456.36 -356.18 -356.180
Proactivity -187.82 -187.82 -140.26 -139.39 -48.202
`Business Support Policy` 603.83 634.57 634.57 677.86 677.865
`Labor Policy` 809.11 842.05 864.88 864.88 937.711
`Law & Order` -826.69 -737.49 -729.78 -637.65 -637.654
************************Diagnostic information*************************
Number of data points: 66
Effective number of parameters (2trace(S) - trace(S'S)): 10.32786
Effective degrees of freedom (n-2trace(S) + trace(S'S)): 55.67214
AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 1163.372
AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 1145.841
BIC (GWR book, Fotheringham, et al. 2002,GWR p. 61, eq. 2.34): 1112.783
Residual sum of squares: 114549287
R-square value: 0.3841049
Adjusted R-square value: 0.267759
***********************************************************************
Program stops at: 2024-11-11 01:20:13.532168
bw.fixed_capital <- bw.gwr(formula = total_registered_capital ~
`Entry Costs` + `Land Access` + Transparency +
`Time Costs` + `Informal charges` + Proactivity +
`Business Support Policy` + `Labor Policy` +
`Law & Order`,
data=pci_2021,
approach="CV",
kernel="bisquare",
adaptive=FALSE,
longlat=FALSE)Fixed bandwidth: 968784.2 CV score: 7783319735
Fixed bandwidth: 598861.3 CV score: 8456546162
Fixed bandwidth: 1197409 CV score: 6907612208
Fixed bandwidth: 1338707 CV score: 6394763233
Fixed bandwidth: 1426034 CV score: 6166490448
Fixed bandwidth: 1480005 CV score: 6058968306
Fixed bandwidth: 1513361 CV score: 6003759868
Fixed bandwidth: 1533976 CV score: 5973544648
Fixed bandwidth: 1546717 CV score: 5956273318
Fixed bandwidth: 1554591 CV score: 5946105624
Fixed bandwidth: 1559458 CV score: 5940013121
Fixed bandwidth: 1562465 CV score: 5936321848
Fixed bandwidth: 1564324 CV score: 5934068480
Fixed bandwidth: 1565473 CV score: 5932686416
Fixed bandwidth: 1566183 CV score: 5931836280
Fixed bandwidth: 1566622 CV score: 5931312400
Fixed bandwidth: 1566893 CV score: 5930989208
Fixed bandwidth: 1567061 CV score: 5930789688
Fixed bandwidth: 1567164 CV score: 5930666463
Fixed bandwidth: 1567228 CV score: 5930590338
Fixed bandwidth: 1567268 CV score: 5930543303
Fixed bandwidth: 1567292 CV score: 5930514238
Fixed bandwidth: 1567308 CV score: 5930496277
Fixed bandwidth: 1567317 CV score: 5930485177
Fixed bandwidth: 1567323 CV score: 5930478317
Fixed bandwidth: 1567326 CV score: 5930474077
Fixed bandwidth: 1567328 CV score: 5930471457
Fixed bandwidth: 1567330 CV score: 5930469838
Fixed bandwidth: 1567331 CV score: 5930468837
Fixed bandwidth: 1567331 CV score: 5930468219
Fixed bandwidth: 1567331 CV score: 5930467836
Fixed bandwidth: 1567332 CV score: 5930467600
Fixed bandwidth: 1567332 CV score: 5930467454
Fixed bandwidth: 1567332 CV score: 5930467364
Fixed bandwidth: 1567332 CV score: 5930467308
Fixed bandwidth: 1567332 CV score: 5930467274
Fixed bandwidth: 1567332 CV score: 5930467252
Fixed bandwidth: 1567332 CV score: 5930467239
Fixed bandwidth: 1567332 CV score: 5930467231
Fixed bandwidth: 1567332 CV score: 5930467226
Fixed bandwidth: 1567332 CV score: 5930467223
Fixed bandwidth: 1567332 CV score: 5930467221
Fixed bandwidth: 1567332 CV score: 5930467220
Fixed bandwidth: 1567332 CV score: 5930467219
Fixed bandwidth: 1567332 CV score: 5930467219
Fixed bandwidth: 1567332 CV score: 5930467218
Fixed bandwidth: 1567332 CV score: 5930467218
Fixed bandwidth: 1567332 CV score: 5930467218
Fixed bandwidth: 1567332 CV score: 5930467218
Fixed bandwidth: 1567332 CV score: 5930467218
gwr.fixed_capital <- gwr.basic(formula = total_registered_capital ~
`Entry Costs` + `Land Access` + Transparency +
`Time Costs` + `Informal charges` + Proactivity +
`Business Support Policy` + `Labor Policy` +
`Law & Order`,
data=pci_2021,
bw=bw.fixed_capital,
kernel = 'bisquare',
longlat = FALSE)
gwr.fixed_capital ***********************************************************************
* Package GWmodel *
***********************************************************************
Program starts at: 2024-11-11 01:20:13.600976
Call:
gwr.basic(formula = total_registered_capital ~ `Entry Costs` +
`Land Access` + Transparency + `Time Costs` + `Informal charges` +
Proactivity + `Business Support Policy` + `Labor Policy` +
`Law & Order`, data = pci_2021, bw = bw.fixed_capital, kernel = "bisquare",
longlat = FALSE)
Dependent (y) variable: total_registered_capital
Independent variables: Entry Costs Land Access Transparency Time Costs Informal charges Proactivity Business Support Policy Labor Policy Law & Order
Number of data points: 66
***********************************************************************
* Results of Global Regression *
***********************************************************************
Call:
lm(formula = formula, data = data)
Residuals:
Min 1Q Median 3Q Max
-14578 -6219 -1215 5880 22546
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -34440 26619 -1.294 0.201028
`Entry Costs` -4289 2211 -1.940 0.057402 .
`Land Access` 2175 2772 0.785 0.435874
Transparency 428 2002 0.214 0.831530
`Time Costs` 5400 2082 2.593 0.012103 *
`Informal charges` -3822 2374 -1.610 0.112990
Proactivity -2636 2414 -1.092 0.279523
`Business Support Policy` 5736 1491 3.847 0.000309 ***
`Labor Policy` 6999 1715 4.081 0.000144 ***
`Law & Order` -2987 2720 -1.098 0.276892
---Significance stars
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8718 on 56 degrees of freedom
Multiple R-squared: 0.58
Adjusted R-squared: 0.5125
F-statistic: 8.591 on 9 and 56 DF, p-value: 6.006e-08
***Extra Diagnostic information
Residual sum of squares: 4256608383
Sigma(hat): 8155.336
AIC: 1396.117
AICc: 1401.006
BIC: 1400.29
***********************************************************************
* Results of Geographically Weighted Regression *
***********************************************************************
*********************Model calibration information*********************
Kernel function: bisquare
Fixed bandwidth: 1567332
Regression points: the same locations as observations are used.
Distance metric: Euclidean distance metric is used.
****************Summary of GWR coefficient estimates:******************
Min. 1st Qu. Median 3rd Qu.
Intercept -39656.036 -37709.358 -36511.735 -35259.621
`Entry Costs` -7766.261 -6520.588 -4431.855 -2230.021
`Land Access` -43.236 692.711 2153.727 3568.962
Transparency -696.951 -242.575 385.228 604.899
`Time Costs` 3700.055 4149.172 5788.037 7345.897
`Informal charges` -3779.846 -3549.856 -3434.954 -3209.836
Proactivity -4031.731 -3460.458 -2426.955 -890.564
`Business Support Policy` 3464.883 4153.988 5602.217 5987.741
`Labor Policy` 6390.413 6490.595 6778.570 7458.271
`Law & Order` -4128.646 -3918.244 -3182.281 -2150.035
Max.
Intercept -34262.35
`Entry Costs` -1482.41
`Land Access` 4334.98
Transparency 721.93
`Time Costs` 8232.04
`Informal charges` -2711.26
Proactivity -297.88
`Business Support Policy` 6071.96
`Labor Policy` 8111.08
`Law & Order` -1610.82
************************Diagnostic information*************************
Number of data points: 66
Effective number of parameters (2trace(S) - trace(S'S)): 17.98284
Effective degrees of freedom (n-2trace(S) + trace(S'S)): 48.01716
AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 1406.215
AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 1377.055
BIC (GWR book, Fotheringham, et al. 2002,GWR p. 61, eq. 2.34): 1360.196
Residual sum of squares: 3523822174
R-square value: 0.6522746
Adjusted R-square value: 0.5192787
***********************************************************************
Program stops at: 2024-11-11 01:20:13.625485
bw.adaptive_capital <- bw.gwr(formula = total_registered_capital ~
`Entry Costs` + `Land Access` + Transparency +
`Time Costs` + `Informal charges` + Proactivity +
`Business Support Policy` + `Labor Policy` +
`Law & Order`,
data=pci_2021,
approach="CV",
kernel="bisquare",
adaptive=TRUE,
longlat=FALSE)Adaptive bandwidth: 48 CV score: 7552329470
Adaptive bandwidth: 38 CV score: 8078226145
Adaptive bandwidth: 56 CV score: 7245214023
Adaptive bandwidth: 59 CV score: 6996709773
Adaptive bandwidth: 63 CV score: 6680191144
Adaptive bandwidth: 63 CV score: 6680191144
gwr.adaptive_capital <- gwr.basic(formula = total_registered_capital ~
`Entry Costs` + `Land Access` + Transparency +
`Time Costs` + `Informal charges` + Proactivity +
`Business Support Policy` + `Labor Policy` +
`Law & Order`,
data=pci_2021,
bw=bw.adaptive_capital,
kernel = 'bisquare',
adaptive=TRUE,
longlat = FALSE)
gwr.adaptive_capital ***********************************************************************
* Package GWmodel *
***********************************************************************
Program starts at: 2024-11-11 01:20:13.671797
Call:
gwr.basic(formula = total_registered_capital ~ `Entry Costs` +
`Land Access` + Transparency + `Time Costs` + `Informal charges` +
Proactivity + `Business Support Policy` + `Labor Policy` +
`Law & Order`, data = pci_2021, bw = bw.adaptive_capital,
kernel = "bisquare", adaptive = TRUE, longlat = FALSE)
Dependent (y) variable: total_registered_capital
Independent variables: Entry Costs Land Access Transparency Time Costs Informal charges Proactivity Business Support Policy Labor Policy Law & Order
Number of data points: 66
***********************************************************************
* Results of Global Regression *
***********************************************************************
Call:
lm(formula = formula, data = data)
Residuals:
Min 1Q Median 3Q Max
-14578 -6219 -1215 5880 22546
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -34440 26619 -1.294 0.201028
`Entry Costs` -4289 2211 -1.940 0.057402 .
`Land Access` 2175 2772 0.785 0.435874
Transparency 428 2002 0.214 0.831530
`Time Costs` 5400 2082 2.593 0.012103 *
`Informal charges` -3822 2374 -1.610 0.112990
Proactivity -2636 2414 -1.092 0.279523
`Business Support Policy` 5736 1491 3.847 0.000309 ***
`Labor Policy` 6999 1715 4.081 0.000144 ***
`Law & Order` -2987 2720 -1.098 0.276892
---Significance stars
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8718 on 56 degrees of freedom
Multiple R-squared: 0.58
Adjusted R-squared: 0.5125
F-statistic: 8.591 on 9 and 56 DF, p-value: 6.006e-08
***Extra Diagnostic information
Residual sum of squares: 4256608383
Sigma(hat): 8155.336
AIC: 1396.117
AICc: 1401.006
BIC: 1400.29
***********************************************************************
* Results of Geographically Weighted Regression *
***********************************************************************
*********************Model calibration information*********************
Kernel function: bisquare
Adaptive bandwidth: 63 (number of nearest neighbours)
Regression points: the same locations as observations are used.
Distance metric: Euclidean distance metric is used.
****************Summary of GWR coefficient estimates:******************
Min. 1st Qu. Median 3rd Qu. Max.
Intercept -41074.10 -39734.01 -37621.80 -36852.31 -32049.862
`Entry Costs` -7951.74 -7491.97 -4447.79 -1403.80 -1145.634
`Land Access` -329.15 -286.28 2032.35 4158.50 4446.563
Transparency -750.65 -616.85 507.46 744.27 959.962
`Time Costs` 3551.50 3800.16 6762.80 8192.55 8344.559
`Informal charges` -3997.21 -3449.04 -3218.23 -2800.56 -2663.778
Proactivity -4250.08 -3779.54 -2332.51 -180.23 0.763
`Business Support Policy` 3210.83 3227.38 4896.51 5840.28 5912.261
`Labor Policy` 5492.11 6446.88 6512.91 7886.25 8236.279
`Law & Order` -4258.17 -4181.16 -3330.83 -1394.81 -1358.757
************************Diagnostic information*************************
Number of data points: 66
Effective number of parameters (2trace(S) - trace(S'S)): 22.038
Effective degrees of freedom (n-2trace(S) + trace(S'S)): 43.962
AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 1413.956
AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 1374.151
BIC (GWR book, Fotheringham, et al. 2002,GWR p. 61, eq. 2.34): 1368.891
Residual sum of squares: 3191366879
R-square value: 0.6850808
Adjusted R-square value: 0.5235383
***********************************************************************
Program stops at: 2024-11-11 01:20:13.697176
Visualising Local R2
# Converting SDF into sf data.frame
pci_2021 <- st_as_sf(gwr.fixed_project$SDF) %>%
st_transform(crs=3405)
gwr.fixed.output_project <- as.data.frame(gwr.fixed_project$SDF)
pci_2021.fixed_project <- cbind(pci_2021, as.matrix(gwr.fixed.output_project))
glimpse(pci_2021.fixed_project)Rows: 66
Columns: 74
$ Intercept <dbl> -2029.95467, -932.37899, 894.71344, -6…
$ X.Entry.Costs. <dbl> -407.7129, -478.9765, -409.1094, -399.…
$ X.Land.Access. <dbl> 529.76677, -54.62115, -141.09552, 481.…
$ Transparency <dbl> -325.60467, -32.50034, -166.54523, -37…
$ X.Time.Costs. <dbl> 353.1721, 450.6535, 411.2222, 396.8354…
$ X.Informal.charges. <dbl> -411.3981, -193.0164, -231.4254, -546.…
$ Proactivity <dbl> -153.73344, -12.23230, -14.30985, -161…
$ X.Business.Support.Policy. <dbl> 732.5428, 412.1067, 339.5726, 707.5992…
$ X.Labor.Policy. <dbl> 828.7843, 664.5272, 677.9096, 852.7153…
$ X.Law...Order. <dbl> -713.1389, -467.7242, -469.8577, -724.…
$ y <dbl> 31, 595, 4, 15, 1820, 65, 99, 4073, 41…
$ yhat <dbl> -400.13304, 249.19696, 174.39859, 90.2…
$ residual <dbl> 431.13304, 345.80304, -170.39859, -75.…
$ CV_Score <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Stud_residual <dbl> 0.37323197, 0.28012187, -0.17698405, -…
$ Intercept_SE <dbl> 5290.157, 4717.062, 5085.611, 7461.838…
$ X.Entry.Costs._SE <dbl> 508.1250, 419.8693, 446.7241, 585.5165…
$ X.Land.Access._SE <dbl> 525.4821, 520.0565, 559.0493, 673.0689…
$ Transparency_SE <dbl> 404.8617, 430.3879, 462.4965, 432.1743…
$ X.Time.Costs._SE <dbl> 380.3176, 417.3923, 485.3297, 426.4734…
$ X.Informal.charges._SE <dbl> 441.0214, 491.4422, 594.5691, 520.9128…
$ Proactivity_SE <dbl> 522.4891, 455.9661, 515.8280, 557.3436…
$ X.Business.Support.Policy._SE <dbl> 294.3359, 305.0581, 344.2616, 366.8382…
$ X.Labor.Policy._SE <dbl> 402.7943, 349.9538, 374.2111, 433.6041…
$ X.Law...Order._SE <dbl> 515.3410, 542.8703, 603.1861, 599.5543…
$ Intercept_TV <dbl> -0.383722975, -0.197660957, 0.17593036…
$ X.Entry.Costs._TV <dbl> -0.8023870, -1.1407754, -0.9157988, -0…
$ X.Land.Access._TV <dbl> 1.00815386, -0.10502926, -0.25238478, …
$ Transparency_TV <dbl> -0.80423678, -0.07551406, -0.36010053,…
$ X.Time.Costs._TV <dbl> 0.9286239, 1.0796883, 0.8473048, 0.930…
$ X.Informal.charges._TV <dbl> -0.9328302, -0.3927550, -0.3892321, -1…
$ Proactivity_TV <dbl> -0.29423281, -0.02682720, -0.02774152,…
$ X.Business.Support.Policy._TV <dbl> 2.4887983, 1.3509118, 0.9863797, 1.928…
$ X.Labor.Policy._TV <dbl> 2.057587, 1.898900, 1.811570, 1.966576…
$ X.Law...Order._TV <dbl> -1.3838194, -0.8615763, -0.7789598, -1…
$ Local_R2 <dbl> 0.3754886, 0.4586589, 0.5009512, 0.391…
$ Intercept.1 <named list> -2029.955, -932.379, 894.7134, …
$ X.Entry.Costs..1 <named list> -407.7129, -478.9765, -409.1094…
$ X.Land.Access..1 <named list> 529.7668, -54.62115, -141.0955,…
$ Transparency.1 <named list> -325.6047, -32.50034, -166.5452…
$ X.Time.Costs..1 <named list> 353.1721, 450.6535, 411.2222, 3…
$ X.Informal.charges..1 <named list> -411.3981, -193.0164, -231.4254…
$ Proactivity.1 <named list> -153.7334, -12.2323, -14.30985,…
$ X.Business.Support.Policy..1 <named list> 732.5428, 412.1067, 339.5726, 7…
$ X.Labor.Policy..1 <named list> 828.7843, 664.5272, 677.9096, 8…
$ X.Law...Order..1 <named list> -713.1389, -467.7242, -469.8577…
$ y.1 <named list> 31, 595, 4, 15, 1820, 65, 99, 4…
$ yhat.1 <named list> -400.133, 249.197, 174.3986, 90…
$ residual.1 <named list> 431.133, 345.803, -170.3986, -7…
$ CV_Score.1 <named list> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ Stud_residual.1 <named list> 0.373232, 0.2801219, -0.176984,…
$ Intercept_SE.1 <named list> 5290.157, 4717.062, 5085.611, 7…
$ X.Entry.Costs._SE.1 <named list> 508.125, 419.8693, 446.7241, 58…
$ X.Land.Access._SE.1 <named list> 525.4821, 520.0565, 559.0493, 6…
$ Transparency_SE.1 <named list> 404.8617, 430.3879, 462.4965, 4…
$ X.Time.Costs._SE.1 <named list> 380.3176, 417.3923, 485.3297, 4…
$ X.Informal.charges._SE.1 <named list> 441.0214, 491.4422, 594.5691, 5…
$ Proactivity_SE.1 <named list> 522.4891, 455.9661, 515.828, 55…
$ X.Business.Support.Policy._SE.1 <named list> 294.3359, 305.0581, 344.2616, 3…
$ X.Labor.Policy._SE.1 <named list> 402.7943, 349.9538, 374.2111, 4…
$ X.Law...Order._SE.1 <named list> 515.341, 542.8703, 603.1861, 59…
$ Intercept_TV.1 <named list> -0.383723, -0.197661, 0.1759304…
$ X.Entry.Costs._TV.1 <named list> -0.802387, -1.140775, -0.915798…
$ X.Land.Access._TV.1 <named list> 1.008154, -0.1050293, -0.252384…
$ Transparency_TV.1 <named list> -0.8042368, -0.07551406, -0.360…
$ X.Time.Costs._TV.1 <named list> 0.9286239, 1.079688, 0.8473048,…
$ X.Informal.charges._TV.1 <named list> -0.9328302, -0.392755, -0.38923…
$ Proactivity_TV.1 <named list> -0.2942328, -0.0268272, -0.0277…
$ X.Business.Support.Policy._TV.1 <named list> 2.488798, 1.350912, 0.9863797, …
$ X.Labor.Policy._TV.1 <named list> 2.057587, 1.8989, 1.81157, 1.96…
$ X.Law...Order._TV.1 <named list> -1.383819, -0.8615763, -0.77895…
$ Local_R2.1 <named list> 0.3754886, 0.4586589, 0.5009512…
$ geometry.1 <named list> [MULTIPOLYGON (((519993.2 12...…
$ geometry <MULTIPOLYGON [m]> MULTIPOLYGON (((519993.2 …
# Set tmap options to check and fix any invalid polygons
tmap_options(check.and.fix = TRUE)
tmap_mode("view")tmap mode set to interactive viewing
str(pci_2021.fixed_project$Local_R2) num [1:66] 0.375 0.459 0.501 0.392 0.45 ...
pci_2021.fixed_project$Local_R2 <- unlist(pci_2021.fixed_project$Local_R2)
tm_shape(provincial_boundaries)+
tm_polygons(alpha = 0.1) +
tm_shape(pci_2021.fixed_project) +
tm_polygons(col = "Local_R2",
border.col = "gray60",
border.lwd = 1) +
tm_view(set.zoom.limits = c(5,8))Warning: The shape provincial_boundaries is invalid (after reprojection). See
sf::st_is_valid
Warning: The shape pci_2021.fixed_project is invalid (after reprojection). See
sf::st_is_valid
# Converting SDF into sf data.frame
pci_2021 <- st_as_sf(gwr.adaptive_capital$SDF) %>%
st_transform(crs=3405)
gwr.adaptive.output_capital <- as.data.frame(gwr.adaptive_capital$SDF)
pci_2021.adaptive_capital <- cbind(pci_2021, as.matrix(gwr.adaptive.output_capital))
glimpse(pci_2021.adaptive_capital)Rows: 66
Columns: 74
$ Intercept <dbl> -36357.19, -39500.61, -39839.53, -3680…
$ X.Entry.Costs. <dbl> -7683.232, -1420.801, -1339.037, -7873…
$ X.Land.Access. <dbl> 4319.8435, -299.3851, -319.7592, 4399.…
$ Transparency <dbl> -665.91049, 794.51327, 753.97957, -730…
$ X.Time.Costs. <dbl> 8266.566, 3794.920, 3697.926, 8322.321…
$ X.Informal.charges. <dbl> -2735.104, -3462.981, -3347.872, -2687…
$ Proactivity <dbl> -4111.6141396, -221.8046909, -182.9868…
$ X.Business.Support.Policy. <dbl> 5912.261, 3218.825, 3231.670, 5866.189…
$ X.Labor.Policy. <dbl> 8019.760, 6507.622, 6489.537, 8176.664…
$ X.Law...Order. <dbl> -4236.344, -1369.919, -1383.720, -4160…
$ y <dbl> 317.31, 9408.00, 7.90, 4496.04, 23317.…
$ yhat <dbl> 3102.7876, 3251.9995, 1321.5255, -957.…
$ residual <dbl> -2785.47762, 6156.00054, -1313.62546, …
$ CV_Score <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Stud_residual <dbl> -0.457111938, 0.916567017, -0.23380761…
$ Intercept_SE <dbl> 41159.58, 31226.01, 31673.08, 41965.30…
$ X.Entry.Costs._SE <dbl> 3467.347, 2747.628, 2789.168, 3505.653…
$ X.Land.Access._SE <dbl> 3903.781, 3521.329, 3592.957, 3965.241…
$ Transparency_SE <dbl> 2632.882, 2933.076, 2986.650, 2659.684…
$ X.Time.Costs._SE <dbl> 2598.653, 2912.814, 2974.442, 2618.904…
$ X.Informal.charges._SE <dbl> 3115.716, 3501.277, 3617.090, 3153.564…
$ Proactivity_SE <dbl> 3349.085, 3091.690, 3133.565, 3385.799…
$ X.Business.Support.Policy._SE <dbl> 2132.031, 2034.294, 2078.796, 2156.493…
$ X.Labor.Policy._SE <dbl> 2549.012, 2279.120, 2313.595, 2578.758…
$ X.Law...Order._SE <dbl> 3516.913, 3713.756, 3820.423, 3552.740…
$ Intercept_TV <dbl> -0.8833225, -1.2649906, -1.2578360, -0…
$ X.Entry.Costs._TV <dbl> -2.2158823, -0.5171011, -0.4800848, -2…
$ X.Land.Access._TV <dbl> 1.10657936, -0.08502047, -0.08899612, …
$ Transparency_TV <dbl> -0.25292076, 0.27088054, 0.25244996, -…
$ X.Time.Costs._TV <dbl> 3.181097, 1.302836, 1.243233, 3.177788…
$ X.Informal.charges._TV <dbl> -0.8778413, -0.9890621, -0.9255707, -0…
$ Proactivity_TV <dbl> -1.227682827, -0.071742205, -0.0583957…
$ X.Business.Support.Policy._TV <dbl> 2.773065, 1.582281, 1.554588, 2.720245…
$ X.Labor.Policy._TV <dbl> 3.146222, 2.855323, 2.804959, 3.170775…
$ X.Law...Order._TV <dbl> -1.2045632, -0.3688769, -0.3621902, -1…
$ Local_R2 <dbl> 0.7322584, 0.6414571, 0.6553979, 0.741…
$ Intercept.1 <named list> -36357.19, -39500.61, -39839.53…
$ X.Entry.Costs..1 <named list> -7683.232, -1420.801, -1339.037…
$ X.Land.Access..1 <named list> 4319.844, -299.3851, -319.7592,…
$ Transparency.1 <named list> -665.9105, 794.5133, 753.9796, …
$ X.Time.Costs..1 <named list> 8266.566, 3794.92, 3697.926, 83…
$ X.Informal.charges..1 <named list> -2735.104, -3462.981, -3347.872…
$ Proactivity.1 <named list> -4111.614, -221.8047, -182.9869…
$ X.Business.Support.Policy..1 <named list> 5912.261, 3218.825, 3231.67, 58…
$ X.Labor.Policy..1 <named list> 8019.76, 6507.622, 6489.537, 81…
$ X.Law...Order..1 <named list> -4236.344, -1369.919, -1383.72,…
$ y.1 <named list> 317.31, 9408, 7.9, 4496.04, 233…
$ yhat.1 <named list> 3102.788, 3251.999, 1321.525, -…
$ residual.1 <named list> -2785.478, 6156.001, -1313.625,…
$ CV_Score.1 <named list> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ Stud_residual.1 <named list> -0.4571119, 0.916567, -0.233807…
$ Intercept_SE.1 <named list> 41159.58, 31226.01, 31673.08, 4…
$ X.Entry.Costs._SE.1 <named list> 3467.347, 2747.628, 2789.168, 3…
$ X.Land.Access._SE.1 <named list> 3903.781, 3521.329, 3592.957, 3…
$ Transparency_SE.1 <named list> 2632.882, 2933.076, 2986.65, 26…
$ X.Time.Costs._SE.1 <named list> 2598.653, 2912.814, 2974.442, 2…
$ X.Informal.charges._SE.1 <named list> 3115.716, 3501.277, 3617.09, 31…
$ Proactivity_SE.1 <named list> 3349.085, 3091.69, 3133.565, 33…
$ X.Business.Support.Policy._SE.1 <named list> 2132.031, 2034.294, 2078.796, 2…
$ X.Labor.Policy._SE.1 <named list> 2549.012, 2279.12, 2313.595, 25…
$ X.Law...Order._SE.1 <named list> 3516.913, 3713.756, 3820.423, 3…
$ Intercept_TV.1 <named list> -0.8833225, -1.264991, -1.25783…
$ X.Entry.Costs._TV.1 <named list> -2.215882, -0.5171011, -0.48008…
$ X.Land.Access._TV.1 <named list> 1.106579, -0.08502047, -0.08899…
$ Transparency_TV.1 <named list> -0.2529208, 0.2708805, 0.25245,…
$ X.Time.Costs._TV.1 <named list> 3.181097, 1.302836, 1.243233, 3…
$ X.Informal.charges._TV.1 <named list> -0.8778413, -0.9890621, -0.9255…
$ Proactivity_TV.1 <named list> -1.227683, -0.0717422, -0.05839…
$ X.Business.Support.Policy._TV.1 <named list> 2.773065, 1.582281, 1.554588, 2…
$ X.Labor.Policy._TV.1 <named list> 3.146222, 2.855323, 2.804959, 3…
$ X.Law...Order._TV.1 <named list> -1.204563, -0.3688769, -0.36219…
$ Local_R2.1 <named list> 0.7322584, 0.6414571, 0.6553979…
$ geometry.1 <named list> [MULTIPOLYGON (((519993.2 12...…
$ geometry <MULTIPOLYGON [m]> MULTIPOLYGON (((519993.2 …
# Set tmap options to check and fix any invalid polygons
tmap_options(check.and.fix = TRUE)
tmap_mode("view")tmap mode set to interactive viewing
str(pci_2021.adaptive_capital$Local_R2) num [1:66] 0.732 0.641 0.655 0.741 0.642 ...
pci_2021.adaptive_capital$Local_R2 <- unlist(pci_2021.adaptive_capital$Local_R2)
tm_shape(provincial_boundaries)+
tm_polygons(alpha = 0.1) +
tm_shape(pci_2021.adaptive_capital) +
tm_polygons(col = "Local_R2",
border.col = "gray60",
border.lwd = 1) +
tm_view(set.zoom.limits = c(5,8))Warning: The shape provincial_boundaries is invalid (after reprojection). See
sf::st_is_valid
Warning: The shape pci_2021.adaptive_capital is invalid (after reprojection).
See sf::st_is_valid
Visualising coefficient estimates
tmap_mode("view")tmap mode set to interactive viewing
AREA_SQM_SE <- tm_shape(provincial_boundaries)+
tm_polygons(alpha = 0.1) +
tm_shape(pci_2021.fixed_project) +
tm_polygons(col = "Transparency_SE",
border.col = "gray60",
border.lwd = 1) +
tm_view(set.zoom.limits = c(5,8))
AREA_SQM_TV <- tm_shape(provincial_boundaries)+
tm_polygons(alpha = 0.1) +
tm_shape(pci_2021.adaptive_capital) +
tm_polygons(col = "Transparency_TV",
border.col = "gray60",
border.lwd = 1) +
tm_view(set.zoom.limits = c(5,8))
tmap_arrange(AREA_SQM_SE, AREA_SQM_TV,
asp=1, ncol=2,
sync = TRUE)Warning: The shape provincial_boundaries is invalid (after reprojection). See
sf::st_is_valid
Warning: The shape pci_2021.fixed_project is invalid (after reprojection). See
sf::st_is_valid
Warning: The shape provincial_boundaries is invalid (after reprojection). See
sf::st_is_valid
Warning: The shape pci_2021.adaptive_capital is invalid (after reprojection).
See sf::st_is_valid
Variable(s) "Transparency_TV" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
6.0 Shiny Storyboard
6.1 Global Explanatory Model
6.1.1 Multiple Linear Regression (MLR)
Multiple Linear Regression (MLR) provides a baseline model to predict an outcome using multiple predictor variables. It assumes a consistent, global relationship across all data points, offering a broad understanding of how these variables influence the dependent variable overall.
Users have the flexibility to select specific combinations of predictor variables (e.g., PCI factors) to explore different modeling outcomes and compare how these combinations affect results.
Interpretation: A higher Adjusted R2 indicates that the model more effectively explains the variation in the outcome. Additionally, p-value reflects the statistical significance of the model, with lower values indicating stronger confidence in the predictors’ influence.

6.1.2 Stepwise Model Selection
Stepwise Model Selection allows users to refine the Multiple Linear Regression model by adding or removing predictor variables in a systematic way.
Users can select from 3 approaches:
forward selection (starting with no predictors and adding them),
backward elimination (starting with all predictors and removing them),
or both (a combination of adding and removing).
Users also have control over the confidence level (e.g., 0.95 or 0.99), adjusting the stringency for including predictors based on statistical significance.
Interpretation and Visualization: A radar chart provides a visual comparison of different models, allowing users to select and view how models perform across chosen predictors and confidence levels. This helps in identifying the most effective model based on the desired balance of predictors and statistical robustness.

6.1.3 Visualise Model Parameters
Visualize Model Parameters offers users an interactive way to examine and compare the effects of predictor variables across different stepwise models. Users can select their preferred model from the stepwise selection results and sort parameter values in ascending or descending order for easier comparison.
Interpretation and Customization: This visualization provides a clear view of each predictor’s influence on the outcome variable within the selected model, helping users assess the relative importance of predictors. By sorting the parameters, users can quickly identify the most impactful variables or spot subtle differences across models, aiding in deeper analysis and model refinement.

6.2 Local Explanatory Model
6.2.1 Bandwidth Selection
Bandwidth Selection in Geographically Weighted Regression (GWR) allows users to fine-tune the model’s spatial sensitivity by choosing between fixed and adaptive bandwidth options.
Side-by-Side Comparison:
Users can compare the effects of fixed vs. adaptive bandwidths side-by-side. This comparison highlights how each bandwidth type influences the spatial scale of the analysis:Fixed bandwidth applies a constant spatial radius for all data points, which is ideal for evenly spaced data.
Adaptive bandwidth adjusts based on data density, using a larger bandwidth in sparse areas and a smaller one in dense areas, enhancing accuracy in regions with varying data distributions.
Selection Options for Each Bandwidth Type:
Approach: Users can choose from cross-validation or A/C corrected methods to optimize the bandwidth.
Kernel Method: Users can select the kernel type (e.g., Gaussian, bisquare, or tricube) to define the shape and weighting of spatial influence around each data point, tailoring the model to the spatial structure of the data.
Purpose:
This flexible bandwidth selection process allows users to determine the optimal balance of local vs. global influence, enhancing model accuracy and providing insights into spatial patterns at different scales.

6.2.2 Visualise Local R2
Visualizing Local R2 provides an in-depth look at how well the Geographically Weighted Regression (GWR) model explains variations in the outcome across different areas. This visualization highlights spatial differences in model performance, making it easier to identify regions where predictors more effectively capture local patterns.
Customization Options:
Bandwidth Type: Choose between fixed or adaptive bandwidth to control the scale of spatial influence.
Bandwidth Optimization Approach: Select cross-validation or A/C corrected to determine the optimal bandwidth, allowing a focus on either predictive accuracy or model simplicity.
Kernel Method: Choose the kernel type (e.g., Gaussian, bisquare, or tricube) to set the shape and weighting of spatial influence, refining how local R2 values are calculated across locations.
This flexible visualization helps users assess where the model has strong or weak explanatory power across the study area, offering insights into local model fit. By adjusting bandwidth, approach, and kernel, users can explore how these choices impact model performance, identifying areas with robust predictions and areas needing further investigation.

6.5 Spatial Non-Stationary Assumption
The Spatial Non-Stationarity Assumption tool helps assess whether relationships between variables vary across different spatial locations. In this analysis, it tests whether the influence of Provincial Competitiveness Index (PCI) dimensions on FDI metrics (Total Number of Projects and Total Registered Capital) remains consistent across provinces or if it changes depending on regional characteristics.
